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Optimal  Search  for  the  Best  Alternative 

Summary 

This  paper  completely  characterizes  the  solution  to  the  problem  of 
searching  for  the  best  outcome  from  alternative  sources  with  different 
characteristics.   The  optimal  strategy  is  an  elementary  reservation  price 
rule,  where  the  reservation  prices  are  easy  to  calculate  and  have  an 
intuitive  economic  interpretation. 

Introduction 

A  broad  class  of  economic  search  problems  can  be  cast  in  the  following 
form.   There  are  a  number  of  different  opportunities  or  sources,  each 
yielding  an  unknown  reward.   The  uncertainty  about  the  reward  from  a  source 
can  be  eliminated,  at  a  fee,  by  searching  or  sampling.  Each  source  has 
its  own:  independent  probability  distribution  for  the  reward;  search 
cost;  search  time.   Sources  are  sampled  sequentially,  in  whatever  order 
is  desired.   When  it  has  been  decided  to  stop  searching,  only  one  opportunity 
is  accepted,  the  maximum  sampled  reward.   Under  this  formulation,  what 
sequential  search  strategy  maximizes  expected  present  discounted  value? 

A  powerful  solution  concept  applies  to  the  above  model.   Each  source 
is  assigned  a  reservation  price  —  an  invarient  critical  number  analogous 
to  an  internal  rate  of  return.   The  selection  rule  is  to  search  next 
that  unsampled  source  with  highest  reservation  price.   The  stopping  rule 
is  to  terminate  search  whenever  the  maximum  sampled  reward  is  above  the 
reservation  price  of  every  unsampled  source.   This  simple  characteriza- 
tion of  an  optimal  policy  is  the  basic  result  of  the  present  paper.   tts 
fundamental  properties  are  derived  and  interpreted. 
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A  Dialogue 

The  research  department  of  a  certain  large  organization  has  been 
assigned  the  task  of  finding  a  new  and  better  way  to  produce  ergs.   Two 
alternative  technologies  are  being  considered,  the  benefits  of  which 
are  uncertain  and  would  not  be  known  until  development  work  is  actually 
completed.   It  is  estimated  that  a  production  process  based  on  the  so- 
called  alpha  technology  could  yield  a  total  dollar  savings  of  $100  with 
probability  —  and  of  $70  with  probability  -r   .   The  alternative  omega 

process  with  probability  -  might  deliver  a  possible  savings  of  $200 

4 
but  it  would  not  offer  any  improvement  with  probability  —  .   R&D,  which 

must  be  done  to  remove  the:  uncertainty,  costs  $15  for  the  alpha  process 

and  $25  for  omega.   Table  1  summarizes  the  relevant  information. 


project  a  fi 

cost  $15  $25 

reward  $100        $70  $200        $0 

probability  -  -j  -  - 

Table  1 

Department  analyst  R.D.  Findingst  is  not  quite  sure  what  to  recommend. 
Either  project  looks  economical  to  Findingst,  since  both  are  positive  by 
the  expected  net  benefit  criterion  he  remembers  from  somewhere.   But  which 
alternative  should  be  researched  first?  R.D.  doesn't  recall  how  to  answer 
this  sort  of  question.   By  any  criterion  he  has  ever  heard  of,  alpha  surely 
looks  better.   It  has  lower  cost,  higher  expected  reward,  less  variance. 
Findingst  senses  that  alpha  must  come  first,  but  he  is  not  sure  why. 
llefore  m.iking  a  final  recommendation  he  wishes  to  consult  with  his  old 
friend,  the  Lamouu  expert  on  the  economics  o)  advanced  erg  technology, 
Pandora  Erda. 
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"Pandora,"  R.D.  impores  after  explaining  the  situation,  "should 
alpha  or  omega  be  developed  first?  You  know  the  Operations  Research  guys  — 
if  I  guess  wrong  they'll  never  let  me  forget  it." 

"Don't  worry,"  assures  Pandora,  "we  can  figure  this  out.   As  you 
already  know,  researching  either  project  is  better  than  researching  no 
project.   Suppose  alpha  is  developed  first.   If  the  payoff  turns  out  to 
be  $70,  you  would  then  want  to  develop  omega,  because  the  expected  value 
of  that  project  would  then  be 

-$25  +  y  ($200)  +  -|  ($70)  =  $71 

which  is  greater  than  the  value  at  that  point  of  not  developing  omega, 
$70.  However,  since 

-$25  +  --  ($200)  +  |  ($100)  =  $95  <  $100  , 

it  would  not  be  economical  to  develop  omega  if  alpha  had  a  $100  payoff. 

"So  the  expected  value  of  an  optimal  policy  beginning  with  developing 
alpha  is 

-$15  +  |  ($100)  +  |  (-$25  +  |  ($70)  +  j  ($200))  =  $70  \ 

"A  similar  calculation  shows  the  expected  value  of  an  optimal  policy 
which  starts  by  developing  omega  is 

-$25  +  \   ($200)  +  j  (-$15  +  \   ($100)  +  |  ($70))  =  $71  . 

"Hey,  it's  better  to  start  with  omega!" 

"What?"  exclaims  a  stunned  R.D.   "But  alpha  looks  so  much  more 
economical.   Its  net  expected  value  is 

-$15  +  \   ($100)  +  |  ($70)  =  $70 
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compared  with  omega  yielding  a  mere 

-$25  +  -j  ($200)  +  j   ($0)  =  $15  . 

"If  we  had  to  pick  only  one  project  we'd  certainly  select  alpha. 
It's  worth  more  because  we'd  be  a  lot  worse  off  without  it  than  without 
omega.  So  how  come  your  conclusion  is  different?" 

"That's  just  it,"  muses  Pandora,  "there  is  a  difference  between 
the  worth  of  a  project  and  thi  order  in  which  it  should  be  developed. 
Even  though  in  some  sense  omega  is  worth  less,  the  optimal,  sequential 
strategy  is  to  develop  it  first.   That  way  you  get  a  shot  at  the  $200 
right  away,  and  if  you  fail  there  is  always  alpha  to  fall  back  on. 
Hmm.   I  wonder  whether  there's  a  simple  selection  rule  hidden  somewhere 
in  all  this.  If  you  had  to  make  explicit  comparisons  among  project 
sequences  what  a  mess  that  would  be  in  the  general  case." 

"What  general  case?"  snaps  R.D.   "Wait  till  the  O.R.  crew  hears 
what  I  have  to  say  about  analyzing  this  case!" 

Pandora's  Problem 

There  are  n  closed  boxes  at  the  beginning  of  our  scenario.   Box  i, 
1  <  i  <  n,  contains  a  potential  reward  of  x  with  probability  distribution 
function  F  (x  ) ,  independent  of  the  other  rewards.   It  costs  c  to  open 
box  i  and  learn  its  contents,  which  become  known  only  after  a  time  lag 
of  t..   Instantaneous  learning  is  the  special  case  t.  =  0. 

An  initial  amount  x  is  available,  representing  a  fallback  reward 

that  could  always  be  collected  if  no  sampling  were  undertaken  or  if 

every  sampled  reward  happened  to  be  less  than  x  .   In  many  applications 

it  is  natural  to  set  x  =0.  All  costs  and  benefits  are  converted  to 

o 

present  values  by  the  discount  rate  r. 
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At  each  stage  Pandora  must  decide  whether  or  not  to  open  a  box. 
If  she  chooses  to  stop  searching,  Pandora  collects  at  that  time  the 
maximum  reward  she  has  thus  far  uncovered.   Should  Pandora  wish  to  continue 
sampling,  she  must  select  the  next  box  to  be  opened,  pay  at  that  time 
the  fee  for  opening  it,  and  wait  for  the  outcome.   Then  will  come  the 
next  decision  stage.   Note  a  characteristic  asymmetry:   the  sum  of  search 
costs  is  paid  during  search,  whereas  the  maximum  reward  is  collected  after 
search  has  been  terminated. 

Pandora  worships  maximized  expected  present  discounted  value.   She 
needs  to  know  what  she  should  do  to  be  consistent  with  this  fundamental 
conviction.  Pandora  wants  a  sequential  decision  rule  that  will  tell  her 
at  each  stage  whether  or  not  to  continue  searching,  and  if  so,  which  box 
to  open  next. 

Pandora's  problem  can  be  formally  posed  in  dynamic  programming  format. 
Let  the  collection  of  n  boxes,  denoted  I,  be  partitioned  into  any  set  S  of 
sampled  boxes  and  its  complement  S  of  closed  boxes.   That  is, 

S  U  S  =  I  ,  s  n  s  =  ^ 

where 

I  =  {1,2,. . . ,n}  . 

The  variable  y  will  represent  the  maximum  sampled  reward  (from  the 
opened  boxes  and  the  initial  fallback  reward) 

y  =   max   x.  . 

ieSu{o} 

Tt  is  intuitively  obvious  (and  easily  verified)  that  all  relevant 
information  about  the  previously  opened  boxes  is  summarized  by  y;  knowing 
the  individual  values  of  x.  for  ieSu{o}  is  superfluous  to  making  a  correct 
decision  because  all  probability  distributions  are  independent. 
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The  state  of  the  system  at  any  time  is  given  by  the  statistic  (S,y). 
Define  fCS.y)  as  the  expected  present  discounted  value  of  following  an 
optimal  policy  from  this  time  on  when  the  set  of  closed  boxes  is  S  and 
the  maximum  sampled  reward  is  y. 

For  each  subset  S  of  I  and  every  y,  the  state  valuation  functions 
must  satisfy  the  fundamental  recursive  relation 


r  y  oo  .. 

V(S,y)  =  maxj  y,  maxj  -c±  +  3±  Y(S-{i},y) J  dF±(xi)  +  /  4'CS-U}^)  dF  (x  )  }}     (2) 


ieS 


where 


V(d,x)  =  x 


-rtj 


3±  =  e 


Equation  (2)  is  .just  the  principle  of  optimality  for  dynamic  programming. 
At  stage  (S,y)  Pandora  could  terminate  search,  collecting  reward  y.   Or, 
she  might  open  box  i,  for  each  ieS,  which  results  in  expected  discounted 
net  gain 


•Ci  +  h 


ns-{i},y)J  dF.(x1)  +  /  f(S-li},xi)  dFi(x±) 

— oo  y 


(3) 
(4) 


The  value  of  an  optimal  policy  at  (S,y)  is  the  maximum  of  these  alternatives. 

In  principle,  the  state  valuation  functions  {¥(S,y)}  could  be  recursively 
built  up  by  systematic  induction  on  the  number  of  closed  boxes.   Using 
(2) ,  (3) ,  state  valuation  functions  could  be  constructed  first  for  all 
sets  consisting  of  one  closed  box,  then  for  all  st;ts  of  two  boxes,  of  three, 
four,  etc.   The  actual  computation  is  likely  to  be  a  combinatoric  task  of 
unwieldy  proportions  unless  the  number  of  boxes  is  very  small. 
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At  any  stage  (S,y)  ,  Pandora's  optimal  decision  is  that  policy  which 
maximizes  the  right  hand  side  of  (2).   If  two  or  more  policies  tie,  it  makes 
no  difference  how  the  tie  is  broken.   Note  that  although  an  optimal  strategy 
is  implicitly  contained  in  equation  (2),  the  form  of  that  strategy  is  little 
more  than  a  complete  enumeration  of  what  to  do  in  all  possible  situations. 

The  search  literation  has  dealt  extensively  with  the  situation  whera, 
in  effect,  all  boxes  are  identical.   For  this  special  case  the  issue  of 
choosing  which  box  to  open  does  not  arise.   The  essential  question  is 
when  to  stop.   The  answer  is:   search  continues  until  a  reward  greater 
than  some  "reservation  price"  is  discovered.   The  reservation  price  for 
sampling  with  recall  is  that  hypothetical  cutoff  value  of  the  maximum 
reward  which  would  make  it  just  equal  to  the  expected  net  gain  of  opening 
exactly  one  more  box. 

The  contribution  of  the  present  paper  is  to  show  that  with  alternative 
search  opportunities  the  optimal  policy  is  a  straightforward  analogue 
of  the  above  idea.   Each  (different)  box  is  assigned  a  (different)  reserva- 
~~EIon  price^which  now  serves  as  a  basis  for  the  optimal  stopping  and 
selection  rule.   The  reservation  price  of  a  box  determines  its  ordinal 
ranking,  prescribing  when  it  should  be  opened  relative  to  the  other  boxes. 
Thus,  all  the  advantages  of  a  simple  rate  of  return  criterion  apply  in 
a  search  context. 

Some  Applications 

The  formulation  presented  in  this  paper  is  general  enough  to  cover, 
.it  a  high  level  of  abstraction,  economic  search  models  from  a  variety 
of  settings. 


This  stopping  rule  is  well  known  and  appears  in  many  places.   See, 
for  example,  Lippman  and  McCall  [1976]  or  l.andsberger  and  Peled  [1977]. 
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Take  for  example  the  standard  job  search  model  with  wage  offers 
retained.   The  current  framework  allows  the  situation  where  the  job  searcher 
may  choose  to  sample  from  various  firms  having  different  characteristics. 
The  lump  sum  reward  is  most  appropriately  interpreted  as  the  discounted 
present  value  of  all  future  wages.   Search  costs,  which  presumably  include 
a  psychic  component,  are  net  of  any  side  compensation  (unemployment  benefits 
or  wages  from  a  currently  held  job).   The  possibility  of  reaccepting  current 
work  while  searching  on-the-job  is  accommodated  by  making  the  fallback 
reward  x  equal  to  the  present  discounted  value  of  the  current  wage. 
Other  modifications  are  also  possible. 

Searching  for  the  lowest  price  on  some  commodity  available  from 
different  stores  is  also  an  example  amenable  to  the  analysis  developed 
in  this  paper.   Let  the  good  have  some  intrinsic  utility  measured  in 
dollar  terms.   The  reward  available  from  a  store  is  the  difference  between 
the  utility  of  the  commodity  and  its  price.   Search  costs  should  include 
the  opportunity  loss  of  forgoing  the  item  in  question  while  search  continues, 
as  well  as  the  more  orthodox  cost  of  visiting  a  store  to  obtain  a  price 
quotation.   The  option  of  not  buying  the  good  at  all  can  be  represented  by 
setting  the  fallback  reward  equal  to  the  (dis)utility  of  henceforth  doing 
without  the  item  altogether,  which  can  be  normalized  to  zero. 

Another  area  of  application  concerns  the  optimal  sequential  research 
strategy  for  developing  various  uncertain  technologies  to  meet  the  same  or 
a  similar  purpose.   The  reward  is  the  potential  cost  saving  of  the  new 
technology,  unknown  until  after  it  has  been  developed.   Search  fees  are 
research  and  development  expenditures.   Search  time  is  the  anticipated 
length  of  the  R&D  process.   The  option  of  choosing  to  continue  with  the 
current  known  technology  is  represented  by  having  a  zero  fallback  reward. 
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There  are  other  possible  interpretations  of  the  model,  but  it  would 
be  tedious  to  go  on  listing  them. 

The  Optimal  Strategy 

Any  closed  box  is  characterized  by:   a  fee  for  opening  it;  a  time  lag 
for  discovering  its  contents;  a  probability  distribution  for  the  reward 
it  contains.   Suppose  all  this  information  must  somehow  be  compressed  into 
a  single  index  number,  a  kind  of  internal  rate  of  return.   One  heuristic 
procedure  would  be  to  evaluate  the  intrinsic  search  value  of  a  closed  box 
by  assigning  it  the  hypothetical  reward  of  that  opened  box  to  which  it 
is  in  some  sense  equivalent. 

Suppose  for  the  moment  there  are  just  two  boxes.   One  is  the  closed 
box  i.   The  other  is  an  already  opened  hypothetical  box  offering  reward 
z . .   If  the  searcher  elects  not  to  open  box  i,  she  receives  the  "sure 
thing" 

z±  .  (5) 

If  she  opens  box  i ,  the  searcher  can  expect  a  net  benefit 


•Ci  +  h 


Zi 


z.  {   dFi(xi)  +  /  x±  dF±(xi) 


Zi 


(6) 


The  closed  and  opened  boxes  are  heuristically  "equivalent"  if  the 
searcher  is  just  indifferent  between  opening  box  i  and  not  opening  it. 
This  will  occur  if  (5)  and  (6)  are  equal  to  each  other,  a  condition  which 
can  be  rewritten 

00 

°i  =  pi  /  <xi-zi)  dFi(xi)  -  (l-B^i  •  (?) 

zi 
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The  critical  number  z.  which  satisfies  (7)  is  called  the  reservation 
price  of  box  i . 

Although  the  definition  (7)  has  been  motivated  by  heuristic  considera- 
tions, it  turns  out  there  is  a  rigorous  sense  in  which  all  relevant  informa- 
tion about  box  i  is  summarized  by  its  reservation  price  ?.   . 

The  following  decision  strategy,  called  Pandora ' s  rule,  completely 
characterizes  an  optimal  policy. 

Selection  Rule:   If  a  box  is  to  be  opened,  it  should  be  that 
closed  box  with  highest  reservation  price. 

Stopping  Rule:   Terminate  search  whenever  the  maximum  sampled 
reward  exceeds  the  reservation  price  of  every  closed  box. 

What  is  remarkable  about  this  rule  is  that  the  entire  structure  of 
an  optimal  policy  has  been  reduced  to  a  simple  statement  about  reservation 
prices.  Furthermore,  the  reservation  price  of  each  box  is  calculated  by 
equating  a  hypothetical  gain  of  stopping  (5)  not  wj  th  the  full  gain  of 
opening  the  box  and  continuing  on  in  an  optimal  manner,  but  rather  with 
the  myopic  gain  of  opening  the  box  and  terminating  (6).   In  other  words, 
the  reservation  price  of  a  box  depends  only  on  the  properties  of  that 
box  and  is  independent  of  all  other  search  opportunities. 

Note  that  if  Pandora  samples  from  n  identical  boxes,  the  optimal 
policy  is  to  continue  search  until  she  uncovers  a  reward  greater  than 
the  common  reservation  price  of  each  box.   In  this  special  case,  it  is 
comparatively  simple  to  prove  optimality. 

The  proof  of  Pandora's  rule,  which  is  quite  technical,  is  relegated 
to  the  final  section. 
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From  (7) ,  the  reservation  price  of  a  box  is  completely  insensitive 
to  the  probability  distribution  of  rewards  at  the  lower  end  of  the  tail. 
Any  rearrangement  of  the  probability  mass  located  below  z,  leaves  z. 
unaltered.   It  is  important  to  understand  this  feature.   Considering  that 
a  box  could  be  opened  at  any  time,  the  only  rationale  for  opening  it 
now  is  the  possibility  of  terminating  further  search  by  drawing  a  relatively 
high  reward.   That  is  why  the  lower  end  of  its  reward  distribution  is 
irrelevant  to  the  order  in  which  box  i  should  be  sampled  even  though  it 
may  well  influence  the  value  of  an  optimal  policy  by  altering  the  likelihood 
that  x.  will  end  up  being  the  largest  reward  drawn. 

On  the  other  hand,  as  rewards  become  more  dispersed  at  the  upper 
end  of  the  distribution,  the  reservation  price  increases  and  so  does  the 
net  benefit  of  search.   Other  things  being  equal,  it  is  optimal  to  sample 
first  from  distributions  which  are  more  spread  out  or  riskier  in  hopes 
of  striking  it  rich  early  and  ending  the  search.   This  is  a  major  result 
of  the  present  paper.   Low-probability  high-payoff  situations  should 
be  prime  candidates  for  early  investigation  even  though  they  may  have 
a  smaller  chance  of  ending  up  as  the  source  ultimately  yielding  the  maximum 
reward  when  search  ends . 

The  standard  comparative  statics  exercises  performed  on  (7)  yield 
anticipated  results.   Reservation  price  decreases  with:   greater  search 
cost,  increased  search  time,  or  a  higher  interest  rate.  Moving  the  prob- 
ability mass  of  rewards  to  the  right  (i.e.,  changing  the  distribution 

function  F. (xj  to  GJ(xJ)  <F.(x.))  makes  z.  larger.   Thus,  although 
i  i      i  i   _  l  i         i 

there  is  no  necessary  connection  between  the  mean  reward  and  the  reserva- 
tion price,  there  is  a  well-defined  sense  in  which  higher  rewards  increase 
the  reservation  price.   Similarly,  performing  a  mean  preserving  spread 
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on  the  distribution  function  F  (x.)  makes  z.  bigger.   In  this  sense  a 

2 
riskier  distribution  of  rewards  implies  a  higher  reservation  price. 

Note  that  acceptance  levels  decline  with  the  duration  of  search,  as 
the  best  opportunities  are  sampled  first  and  the  poorer  ones  later. 

Because  it  is  so  easy  to  calculate  reservation  prices,  sensitivity 
analysis  is  made  especially  simple.   The  effect  on  project  ranking  (and 
hence  on  an  optimal  policy)  of  changing  such  parameters  as  search  costs, 
the  probability  distribution  of  rewards,  search  time,  or  the  interest  rate 
is  easily  determined.   It  is  also  easy  to  say  how  an  optimal  search  strategy 
changes  when  certain  opportunities  are  added  to  or  deleted  from  the 
list  of  prospective  candidates. 

An  Example 

To  illustrate  the  nature  of  the  solution  concept  and  indicate  what 
it  depends  on,  an  explicit  example  is  calculated  in  an  interesting  special 
case. 

S  lppose  for  each  i,  box  i  contains  one  of  two  outcomes:   either  zero 
reward  ("failure")  with  probability  1-p  ,  or  positive  reward  R.  ("success") 
with  probability  p..   To  keep  things  simple  there  is  no  discounting 
(P.  =  1)  and  the  expected  net  gain  P..R..  -  c.  is  positive. 

Applying  (7)  to  this  special  case  yields  the  closed  form  expression 

PiRi  "  Ci 

z,  =  -^ i  .  (8) 

i      P4 

The  reservation  price  of  a  box  is  the  expected  net  gain  divided  by 
the  probability  of  success.   For  the  same  expected  net  gain,  that  box 
is  opent  d  first  which  offers  a  smaller  probability  of  success. 


■7 

"This  feature  has  been  analyzed  by  Kohn  and  Shavell  [1974]  for 
the  cusi'  of  identical  boxes. 
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on  the  distribution  function  F  (x.)  makes  z.  bigger.   In  this  sense  a 

2 
riskier  distribution  of  rewards  implies  a  higher  reservation  price. 

Note  that  acceptance  levels  decline  with  the  duration  of  search,  as 
the  best  opportunities  are  sampled  first  and  the  poorer  ones  later. 

Because  it  is  so  easy  to  calculate  reservation  prices,  sensitivity 
analysis  is  made  especially  simple.   The  effect  on  project  ranking  (and 
hence  on  an  optimal  policy)  of  changing  such  parameters  as  search  costs, 
the  probability  distribution  of  rewards,  search  time,  or  the  interest  rate 
is  easily  determined.   It  is  also  easy  to  say  how  an  optimal  search  strategy 
changes  when  certain  opportunities  are  added  to  or  deleted  from  the 
list  of  prospective  candidates. 

An  Example 

To  illustrate  the  nature  of  the  solution  concept  and  indicate  what 
it  depends  on,  an  explicit  example  is  calculated  in  an  interesting  special 
case. 

S  lppose  for  each  i,  box  i  contains  one  of  two  outcomes:   either  zero 
reward  ("failure")  with  probability  1-p. ,  or  positive  reward  R.  ("success") 
with  probability  p..   To  keep  things  simple  there  is  no  discounting 
(3.  =  1)  and  the  expected  net  gain  p.R.  -  c.  is  positive. 

Applying  (7)  to  this  special  case  yields  the  closed  form  expression 


_  PjRi  '  Cl 

Zi  "     P±     " 

The  reservation  price  of  a  box  is  the  expected  net  gain  divided  by 
the  probability  of  success.   For  the  same  expected  net  gain,  that  box 
is  opened  first  which  offers  a  smaller  probability  of  success. 


(8) 


"This  feature  has  been  analyzed  by  Kohn  and  Shavell  [1974]  for 
the  case  of  identical  boxes. 


-13- 


Af ter  ranking  boxes  to  be  opened  in  order  of  decreasing  z . ,  the 
searcher  moves  down  the  list  until  a  success  is  encountered.   At  that 
point  search  ends  because  R.  >  z . . 

Suppose,  as  a  further  restriction,  search  is  for  the  same  object, 
like  a  new  product  with  certain  well-defined  characteristics.   Then  all 
rewards  R.  are  identical.   In  that  case  (8)  reduces  to 

d   Ci 
z.  =  R . 

1       Pi 

The  best  opportunity  is  the  one  offering  the  highest  probability  of  success 
per  dollar  of  search  cost. 

Some  Limitations 


The  purpose  of  the  model  formulated  in  this  paper  is  to  sharply 
characterize  optimal  search  among  alternative  sources  with  different 
characteristics.   Naturally  certain  "other"  aspects  of  the  optimal  search 
problem  have  been  abstracted  away. 

Many  of  the  underlying  assumptions  of  the  present  formuLation  are 
unrealistic.   There  has  been  no  provision  made  for:   adaptive  learning 
about  correlated  probability  distributions;  pay-as-you-go  research  (with 
the  possibility  of  backing  out  of  a  project  if  prospects  start  looking 
unfavorable);  parallel  search  activity;  risk  aversion;  incomplete  or  no 
recall;  collecting  some  reward  before  search  is  terminated;  randomly  generated 

new  opportunities;  a  binding  time  horizon;  uncertain  search  costs  or  search 

3 
time;  etc.   Yet  the  model  as  a  whole  captures  enough  essential  aspects 

of  reality  that  it  should  be  useful  in  providing  project  rankings  which 


Some  of  these  topics  have  been  treated  in  the  literature,  most 
typically  for  the  symmetric  case  where  all  boxes  are  identical.   See  the 
bibliography  cited  at  the  end  of  this  paper. 
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might  serve  as  a  rough  planning  guide  of  sorts,  a  kind  of  pre-investment 
screening  device,  or  a  reference  point  for  the  numerical  analysis  of  a 
more  comprehensive  dynamic  programming  type  "ormulation. 

The  fact  that  it  is  possible  to  explicit  ly  construct;  an  optimal  solu- 
tion makes  the  problem  analyzed  here  a  natural  preliminary  to  more  general 
formulations.   And  the  present  model  may  even  be  a  reasonable  description 
of  some  situations. 

That  such  an  elementary  decision  strategy  as  Pandora's  rule  is  optimal 
depends  more  crucially  than  might  be  supposed  on  the  simplifying  assumptions 
of  the  model.   There  does  not  seem  to  be  available  a  sharp  characterization 
of  an  optimal  solution  when  certain  features  of  the  present  model  are 
changed.   Pandora's  rule  does  not  readily  generalize. 

For  example,  Pandora's  rule  is  not  optimal  if  she  may  only  open  m 
of  the  n  boxes  available  to  her.   An  example  of  this  was  provided  in  the 
second  section,  where  alpha  was  preferable  when  only  one  opportunity  could 
be  searched,  whereas  onega  was  the  better  starting  choice  with  the  possibility 
of  sequential  sampling  from  both  sources.   In  the  general  case  n  %  m  >  2, 
an  involved  permutational  exercise  would  be  required  to  determine  which  box 
should  be  sampled  next.   The  choice  would  depend  in  a  complicated  way 
on  the  maximum  reward  that  has  already  been  drawn  and  the  properties  of 
all  the  boxes  which  remain  unopened. 

If  reward  distributions  were  not  independent ,  the  optimal  search 
strategy  could  be  very  intricate.   When  a  box  is  opened,  the  searcher 
would  learn  not  only  about  its  contents,  but  also  about  the  reward  dis- 
tributions  of  alternative  boxes.    It  appears  plausible  that  other  things 


Rothschild  [1974]  contains  an  illuminating  analysis  of  adaptive 
search  policies  for  the  case  of  identical  boxes. 
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being  equal  It  would  be  better  to  open  a  box  whose  reward  Is  highly  correlated 

with  other  rewards  because  this  adds  a  positive  informational  externality. 

But  translating  such  an  effect  into  a  simple  search  rule  seems  difficult 
except  in  the  most  elementary  cases. 

Parallel  search  efforts  and  pay-as-you-go  research  with  the  option 
of  backing  out  are  important  features  of  the  R&D  scene  omitted  from  the 
current  formulation.   They  seem  to  be  very  hard  to  model  well.   Perhaps 
it  is  wishful  thinking,  but  my  feeling  is  that  the  results  of  this  paper 
might  still  constitute  a  useful  guide  here.   Even  though  diluted  by  more 
complicated  and  realistic  considerations,  some  prescription  resembling 
Pandora's  rule  should  remain  a  prominent  feature  of  any  optimal  sequential 
search  policy. 

If  some  part  of  its  reward  can  be  collected  from  a  research  project 
before  the  sequential  search  procedure  as  a  whole  is  terminated,  that 
could  negate  Pandora's  rule  in  extreme  cases.   It  might  be  optimal  to  start 
off  with  a  cheap  low-risk  research  project  which  promises  to  supply  modest 
benefits  throughout  the  period  of  sequential  search  for  the  best  alterna- 
tive.  Such  a  project  is  unlikely  to  be  chosen  at  the  end  of  the  search, 
but  it  is  developed  at  the  beginning  because  it  can  provide  a  stream  of 
interim  rewards  while  the  results  of  further  sampling  are  awaited. 

The  cases  of  sampling  without  recall,  risk  aversion,  and  randomly 
generated  new  opportunities  have  been  treated  to  some  extent  in  the 
literature.   An  optimal  policy  is  typically  complicated,  especially  when 
there  are  different  kinds  of  boxes.   What  to  do  next  will  depend  in  an 
intricate  way  on  past  results  and  future  possibilities. 


See,  for  example,  Marshak  et  al.  [1967]. 

See  especially  Salop  [1973]  who  treats  thoroughly  the  no  recall 
case.   The  other  two  cases  are  briefly  surveyed  for  a  situation  with  iden- 
tical boxes  in  Lippman  and  McCall  [1976]. 
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If  search  costs  or  search  times  are  randomly  distributed  independently 
of  everything  else,  no  changes  in  formulation  are  necessary  so  long  as 
c.  and  3.  are  interpreted  as  mean  values. 

Proof  of  Optimality 

First  it  must  be  shown  that  the  formula  for  determining  the  reservation 
price  of  a  box  is  well  defined. 
Let 

oo 

H±(z)  =  g±  /   (x±-z)  dFi(x±)  -  (l-3±)z  .  (9) 


It  is  easily  verified  that  the  function  H. (z)  is  continuous  and  monotonic. 
By  taking  the  appropriate  limits,  H .  (-°°)  =  °°,  H .  (°°)  =  -°°  (=0  if  3.=1)  •   Thus, 
there  exists  a  solution  z.  to  the  equation 


Ci  =  R±tz±) 

which  is  unique. 

The  proof  of  the  main  proposition  is  by  induction  on  the  number  of 
closed  boxes.   Suppose  Pandora's  rule  is  optimal  with  m  closed  boxes  re- 
maining and  any  value  of  y  (representing  the  maximum  return  from  the  pre- 
viously opened  boxes  and  the  initial  fallback  reward).   For  m  =  1,  the 
optimality  of  Pandora's  rule  is  easily  demonstrated  just  by  directly 
applying  the  definition  of  reservation  price. 

Henceforth  we  will  be  considering  a  situation  with  m+1  closed  boxes 
(the  set  S)  and  any  value  of  y. 

Let  j  be  a  box  with  biggest  reservation  price  in  the  collection  of 
m+1  closed  boxes 
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JeS 

z.  =  max  z   .  (10) 

J   ieS 

If  y  >  z.,  it  is  simple  to  demonstrate  the  optimality  of  not  opening 
any  boxes.   (After  one  box  is  opened,  by  Pandora's  rule  applied  to  m  closed 
boxes  it  will  be  optimal  to  stop.   Hence  the  question  is  whether  opening 
exactly  one  box  is  better  than  not  opening  any,  which  is  easily  answered 
in  the  negative.)   The  stopping  criterion  of  Pandora's  rule  is  thus  proved 
for  m+1  closed  boxes. 

If  y  <  z.,  it  is  straightforward  to  show  the  nonoptimality  of  no  further 
search.   (Just  opening  box  j  and  then  stopping  would  yield  a  higher  expected 
present  discounted  value.)   Thus,  at  least  one  box  should  be  opened. 

Suppose  (by  contradiction  with  Pandora's  rule)  it  is  optimal  to  open 
box  k  first,  where  k  is  any  box  in  S  having  a  lower  reservation  price 
than  j 

keS 

z,  <  z.  .  (11) 

If  box  k  is  opened  first,  by  the  induction  assumption  on  Pandora's 
rule  for  m  closed  boxes  there  is  an  exact  prescription  of  what  to  do  in 
an  optimal  policy  thereafter.   Let  the  expected  discounted  present  value 
of  opening  box  k  and  following  Pandora's  rule  thereafter,  which  is  alleged 
to  constitute  the  best  strategy,  be  B. 

Consider  the  following  alternative.   Open  box  j  first.   Let  h  be  a 

box  with  second  biggest  reservation  price  in  the  collection  of  m+1  closed 

boxes 

heS-{j} 

z  =   max   z.    .  (12) 

h   ieS-{j}  i 
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If  x.  >  z,  ,  terminate.   Otherwise,  open  box  k  next.   From  then  on  proceed 
by  Pandora's  rule.   Let  the  expected  present  discounted  value  of  this 
alternative  policy  be  A. 

The  rest  of  the  proof,  although  technical  in  its  details,  essentially 
consists  of  showing  that  A  >  B.   From  this  it  follows  that  the  originally 
proposed  policy  of  first  opening  box  k  cannot  in  fact  be  best.   It  must  be 
optimal  to  first  open  the  box  with  biggest  implicit  worth  in  S  and  then, 
from  the  induction  assumption,  to  proceed  by  Pandora's  rule  for  the  remaining 
m  closed  boxes.   But  this  is  just  Pandora's  selection  rule  for  m+1  closed 
boxes,  completing  the  induction  step. 

The  following  notation  is  employed  (Figure  1  may  be  useful  in  providing 
a  sort  of  mnemonic  device) . 


it.  =  prob(x.>z.)  w.  =  E[x.|x.>z.] 

3                     3~  3  3              3 '  J"  3 

\  =  Prob<xk>Zj)  Wk  =  E[xJXk-2jj 

X.    =  prob(z  <x.<z.)  v.    =  E[x.  I  z,  <x.<z.  ]          v.   =  E[max(x.  ,y)  I  z,  <x.<z.  ] 

\  =   Prob(zh<xk<z.)  vk  =   ElxJz^Zj]           vk  =  E[max(xk,y)  |  zh<xk<z .  ] 

Uk  =  prob(zk<xk<zh)  uk  =  Etxk|zk<xk<zh] 


X.  X,  X.  X, 

J  k  J  k 

X.  A,  TT.  TL 

J  k  J  k 

v.  v.  w.  w. 

J  k       .  j  k 

v-  vt. 


variable: 

Xk 

probability: 

\ 

value(s)  : 

Uk 

~~\r 


_r 


z,  z,  z 

k  h  : 

Figure   1 


— "> 


-19- 


d  =  E[max(x  Xj^.y)  |zh<x  <z^.;  zh<xk<z]  (13) 

$  =  E[¥(S-{j}-{k},raax(xj,xk,y))|xj<zh;  xfc<zh]  (14) 

The  expected  present  discounted  value  of  opening  box  k  and  then  pro- 
ceeding by  Pandora's  rule  is 

B  ■  "ck  +  Wk  +  xkM"cj  +  ^fi  +  W  +  u-vtyBjV 

+  (l-TTk-Xk)3k[-Cj  +  TT^.Wj  +  ^PjVj]  +  (l-VXk)(1-7rj"Xj)BkBj*  '   (15) 

The  proposed  alternative  is:   open  box  j;  if  x.  >  z,  ,  terminate; 
if  x,  <  z,  ,  next  open  box  k  and  then  proceed  by  Pandora's  rule.   The  ex- 
pected present  discounted  value  of  such  a  policy  is 

a  V-c.  +  u.e.w.  +  V/j  +  »^j-Vbj  [_ck  +  Vk\  +  \\\] 

+   (1-ir  -X ^(l-^-X^B^*  .  (16) 

Subtracting  (15)  from  (16),  cancelling  some  terms  and  grouping  others, 

A  -  B  =  [cj-TTj6jWj)((l-TTk)6k  -  1)  +  (ck-TTk$kWk)(l  -  ( 1-TT -X  )  B ) 

+  Wj  ~  WjV  "  (1-W6kxJ6J^  •  (17) 

From  (7) , 

c.  -  t^y  -  <«,>.,  ,  (is) 

\  "  WW  +  *k<V*k'  +  M"k-2k>]  "  (1-Bt)2k  •       <19) 

Substituting  in  (17)  for  c.  and  c,  from  the  above  expressions  yields 

3  K 

something  that  can  be  manipulated  into  the  form 

A  -  B  =  (z  -zk)[(ir  B  +1-B  )(TTkBk+l-Bk)]  +  (vk-zk)[AkBk(1~ej"HrjPj)1 

+  (vj+vk-zk-d)[XkBkXjBj]    •  (20) 
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From  the  definition   (13), 

d  =   z     +  E[max(max(x.  ,y)-z,  ,x. -z,  )/z,  <x.<z.  ;z,  <x.  <z.l 
n  j"         nknn-jjh-kj 

<    zh  +  E[(max(x.,y)-zh-Hck-zh)/Zh<x.<z.;zh<xk<z.j 


=  Vj    +  \  "   Zh 

<_  v.   +  vk  -   zk   .  (21) 

Using  the  above  inequality  and  the  fact  that  0<(3.<1,  0<$  <1,  every 

J       k 

term  of  expression  (20)  is  s^en  to  be  non-negative,  with  the  first  term 
strictly  positive.   Thus, 

A  >  B  .  (22) 

This  concludes  our  proof  of  the  form  of  an  optimal  policy. 

Strictly  speaking,  we  have  proved  the  necessity  of  Pandora's  rule. 
That  rule  specifies  a  unique  strategy  for  each  state  (except  when  there 
is  a  tie  for  the  maximum  reservation  price  of  a  closed  box,  in  which  case 
it  can  be  shown,  along  the  lines  of  the  current  proof,  that  how  the  tie 
is  broken  makes  no  difference  to  the  value  of  the  objective  function). 
Thus,  since  an  optimum  policy  exists,  sufficiency  of  Pandora's  rule  has 
also  been  demonstrated. 
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